parameter distribution
- North America > United States > Maryland > Prince George's County > College Park (0.14)
- Europe > Denmark (0.04)
- Asia > Middle East > Jordan (0.04)
- Oceania > Australia > South Australia > Adelaide (0.05)
- Europe > United Kingdom > England > Surrey (0.05)
- Asia > Vietnam (0.05)
- North America > United States > California > Los Angeles County > Los Angeles (0.14)
- Europe > Germany > Saxony > Leipzig (0.04)
- North America > United States > Rhode Island > Providence County > Providence (0.04)
- (4 more...)
9e9f0ffc3d836836ca96cbf8fe14b105-Supplemental-Conference.pdf
Inanutshell, features ofthis dataset are sampled randomly fromN(0,1), and the target is produced by an ensemble of randomly constructed decision trees applied to the sampledfeatures. Our dataset has10,000 objects, 8 features and the target was produced by16 decision trees of depth6. CatBoost is trained with the default hyperparameters. Importantly,thelattermeans that this approach is not covered by the embedding framework described in subsection 3.1. So, it seems to be important to embed each feature separately as describedinsubsection3.1.
- Asia > China > Jiangsu Province > Nanjing (0.04)
- North America > United States (0.04)
- Asia > Japan > Honshū > Chūbu > Ishikawa Prefecture > Kanazawa (0.04)
Posterior Meta-Replay for Continual Learning
In principle, Bayesian learning directly applies to this setting, since recursive and one-off Bayesian updates yield the same result. In practice, however, recursive updating often leads to poor trade-off solutions across tasks because approximate inference is necessary for most models of interest. Here, we describe an alternative Bayesian approach where task-conditioned parameter distributions are continually inferred from data. We offer a practical deep learning implementation of our framework based on probabilistic task-conditioned hypernetworks, an approach we term posterior meta-replay. Experiments on standard benchmarks show that our probabilistic hypernetworks compress sequences of posterior parameter distributions with virtually no forgetting. We obtain considerable performance gains compared to existing Bayesian CL methods, and identify task inference as our major limiting factor. This limitation has several causes that are independent of the considered sequential setting, opening up new avenues for progress in CL.